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Abstract. In quantum optics, the quantum state of a light beam is represented through the 
Wigner function, a density on R 2 which may take negative values but must respect intrinsic 
positivity constraints imposed by quantum physics. In the framework of noisy quantum homo¬ 
dyne tomography with efficiency parameter 1/2 < p < 1, we study the theoretical performance 
of a kernel estimator of the Wigner function. We prove that it is minimax efficient, up to a 
logarithmic factor in the sample size, for the Loo-risk over a class of infinitely differentiable. We 
compute also the lower bound for the L 2 -risk. We construct adaptive estimator, i.e. which does 
not depend on the smoothness parameters, and prove that it attains the minimax rates for the 
corresponding smoothness class functions. Finite sample behaviour of our adaptive procedure 
are explored through numerical experiments. 

Keyword : Non-parametric minimax estimation Adaptive estimation Inverse problem L 2 and 
sup-norm Risk Quantum homodyne tomography Wigner function Radon transform Quantum 

state 

This paper deals with a severely ill-posed inverse problem which comes from quantum optics. 
Quantum optics is a branch of quantum mechanics which studies physical systems at the atomic 
and subatomic scales. Unlike classical mechanics, the result of a physical measurement is gener¬ 
ally random. Quantum mechanics does not predict a deterministic course of events, but rather 
the probabilities of various alternative possible events. It provides predictions on the outcome 
measures, therefore explore measurements involve non-trivial statistical methods and inference on 
the result of a measurement should to be done on identically prepared quantum system. 

To understand our statistical model, we start in Section 1 with a short introduction to the needed 
quantum notions. Section 2 introduces the statistical model by making the link with quantum 
theory. Interested reader can get further acquaintance with quantum concepts through the text¬ 
books or the review articles of Helstrom (1976); Holevo (1982); Barndorff-Nielsen, Gill and Jupp 
(2003) and Leonhardt (1997). 


1. Physical background 

In quantum mechanics, the measurable properties (ex: spin, energy, position, ...) of a quantum 
system are called "observables". The probability of obtaining each of the possible outcomes when 
measuring an observable is encoded in the quantum state of the considered physical system. 

1.1. Quantum state and observable. The mathematical description of the quantum state of 
a system is given in form of a density operator p on a complex Hilbert space Ti. (called the space 
of states) satisfying the three following conditions: 

(1) Self adjoint: p = p *, where p* is the adjoint of p. 

(2) Positive: p > 0, or equivalently (ib, pxb) > 0 for all ib&'H. 

(3) Trace one: Tr(p) = L 

Notice that VfH) the set of density operator p on TL is a convex set. The extreme points of the 
convex set T>(fK) are called pures states and all others states are called mixed states. 
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In this paper, the quantum system we are interested in is a monochromatic light in a cavity. In 
this setting of quantum optics, the space of states T~L we are dealing with is the space of square 
integrable complex valued functions on the real line. A particular orthonormal basis comes with 
this Hilbert space is the Fock basis 


(1) 


t/jjix) := 


V Vn2 j j\ 


Hj(x) 


-ad/2 


where Hj(x) := (—l^'e* 2 ^-e x ~ denote the j-th Hermite polynomial. In this basis, a quantum 
state is described by an infinite density matrix p = [pyfc]yfceN whose entries are equal to 


Pj,k = (Vip W’/c), 


with (•, ■) the inner product. The quantum states which can be created at this moment in labora¬ 
tory are matrices whose entries are decreasing exponentially to 0, he., belong to the natural class 
!Z(C,B,r) defined bellow, with r = 2. Let us define for C > 1, B > 0 and 0 < r < 2, the class 
1Z(C, B,r) is defined as follow 

(2) 7l(C,B,r) := {p quantum state : \p m ,n\ < Cexp(— B(m + n) r ^ 2 )}. 


An example of density matrix of a pure state whose entries are real is given in Figure 1. 


Figure 1. The density matrix p of a coherent-3 state. 



In order to describe mathematically a measurement performed on an observable of a quantum 
system prepared in state p , we give the mathematical description of an observable. An observable 
X is a self adjoint operator on the same space of states TL and 

dim'H 

x= £ x a p a , 

a 

where the eigenvalues {:r a } a of the observable X are real and P a is the projection onto the one 
dimensional space generated by the eigenvector of X corresponding to the eigenvalue x a ■ 

As a quantum state p encompasses all the probabilities of the observables of the considered quan¬ 
tum system, when performing a measurement of the observable X of a quantum state p, the 
result is a random variable X with values in the set of the eigenvalues of the observable X. For a 
quantum system prepared in state p, X has the following probability distribution and expectation 
function 

p p(X = x a ) = Tr(P a p) and E p (X) = Tr(Xp). 

Note that the conditions defining the density matrix p insure that P p is a probability distribution. 
In particular, the characteristic function is given by 

E p (e lt *) = Tr(pe itx ). 
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1.2. Quantum homodyne tomography and Wigner function. In quantum optics, a monochro¬ 
matic light in a cavity is described by a quantum harmonic oscillator. In this setting, the observ¬ 
ables of interest are usually Q and P (resp. the electric and magnetic fields). But according to 
Heisenberg’s uncertainty principle, Q and P are non-commuting observables, they may not be 
simultaneously measurable. Therefore, by performing measurements on (Q,P), we cannot get a 
probability density of the result (Q, P). However, for all phase (f> £ [0,7r] we can measure the 
quadrature observables 

X 0 := Q cos (j> + P sin cj). 

Each of these quadratures could be measured on a laser beam by a technique put in practice for the 
first time by Smithey and called Quantum Homodyne Tomography (QHT). The theoretical 
foundation of quantum homodyne tomography was outlined by Vogel and R.isken (1989). 


Figure 2. QHT measurement scheme. 



The experimental set-up, described in Figure 2, consists of mixing the signal field with a local 
oscillator field (LO) of high intensity \z\ » 1. The phase $ of the LO is choosen s.t. <f> ~ 

The resulting beam is split by a 50-50 beam splitter, and the photodetectors count the photons 
in the two output beams by giving integrated currents I\ and I 2 proportional to the number of 
photons. The result of the measurement is produced by taking the difference of the two currents 
and rescaling it by the intensity \z\. In the case of noiseless measurement and for a phase $ = <j>, 
the result X$ = has density p p (-\<j>) corresponding to measuring X^. 

In others words, when performing a QHT measurement of the observable X^ of the quantum state 
p, the result is a random variable X ^ whose density conditionally to $ = cf> is denoted by p p (-\<j)). 
It’s characteristic function is given by 

E „{e itx *) = Tr(pe Itx ^) = Tr(pe it(Qcos ^ +Psin< W) = Jj[p p (-|</>)](f), 

where F\\p p (-\4>)\{i) = Je ltx p p (-\(j))dx denotes the Fourier transform with respect to the first 
variable. Moreover if $ is chosen uniformly on [0,7r], the joint density probability of (X^, $) with 
respect to the Lebesgue measure on Rx [0,7r] is 

Pp{x,<f>) = -p p {x\4>)l [0 ^{4>). 

7T 

An equivalent representation for a quantum state p is the function W p : R 2 —» K. called the Wigner 
function, introduced for the first time by Wigner (1932). The Wigner function may be obtained 
from the momentum representation 

W p (u,v) := T 2 [W p ]{u,v) = Tr , 


( 3 ) 
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where J ~2 is its Fourier transform with respect to both variables. By applying a change of variables 
(u,v) into (t cos 4>,tsine/)), we get 

(4) W p (t cos </>, f sin (/>) = Ti[p p (-\(j))\(t) = Tr (pe 1 ^). 

The origin of the appellation quantum homodyne tomography comes from the fact that the pro¬ 
cedure described above is similar to positron emission tomography (PET), where the density of 
the observations is the Radon transform of the underlying distribution 

(5) p p (x\4>) = lZ[W p \(x, (/)) = J W p (xcos(/) + tsiruj), xsm<p — tcos(f))dt, 

where 7?.[Wp] denotes the Radon transform of W p . The main difference with PET is that the role 
of the unknown distribution is played by the Wigner function which can be negative. 

The physicists consider the Wigner function as a quasi-probability density of ( Q,P ) if one can 
measure simultaneously (Q,P). Nevertheless, the Wigner function does not satisfy all the prop¬ 
erties of a conventional probability density but satisfies boundedness properties unavailable for 
classical densities. For instance, the Wigner function can and normally does go negative for states 
which have no classical model. The Wigner function is such that 

(6) Wp : i 2 -> R, JJ W p (q,p)dqdp = 1. 

Therefore, the negative part of the Wigner function makes the interpretation in term of density of 
probability in space phases less intuitive. However, the Radon transform of the Wigner function 
is always a probability density. Indeed, conditionally to >I> = <f> and by applying the change of 
variables ( q , p ) into (x cos <fi + t sin 4>, x sin </> — t cos </>), it comes 


= JJ W p (q,p)dqdp 

= JJ W p {x cos (j) + tsin<^, xs\xuj> — t cos 4>)dtdx 
= J TV\W p ](x, cj))dx = J p p (x\4>)dx. 


Note that the existence of negative values in the function of Wigner can be precisely taken like 
criterion to discriminate nonclassical states of the field. Figure 3 is the representation of the 
Wigner function of the vacuum state and the nonclassical one-photon state. 



Figure 3. The Wigner function of the single photon state (left) and the Wigner 
function of the vacuum state (right). 
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In the Fock basis, we can write W n in terms of the density matrix Ip,-J as follows (see Leonhardt 
(1997) for the details). 

W p (q,p) = ^2pjkW j!k (q,p) 
j,k 

where for j > k, 

(7) Wj'k{q,p) = (V2(ip-qj) e~( q2+p2 \L{~ k (2q 2 + 2p 2 ) . 

and L^(x) the Laguerre polynomial of degree k and order a. 


1.3. Pattern functions. The ideal result of the QHT measurement provide (X^,$) of joint 
probability density with respect to the Lebesgue measure on R x [0,7r] equals to 

(8) P p {x,4>) = -Pp(a#)l[ O ,,r](0) = -T^[Wp].(x,<j>)l[ 0 ,n](<t>) 

7T 7r 

The density p p (-, •) can be written in terms of the entries of the density matrix p (see Leonhardt 
(1997)) 

OO 

(9) p p (x,<t>)= "£2 Pj^jixtykixy-b-w, 

j,k=0 


where {ftijjeN F° c k: basis defined in (1). Inversely (see D’Ariano, Macchiavello and Paris 

(1994); Leonhardt (1997) for details), we can write 

( 10 ) Pj,k = J J Pp(x,<j>)f jt k(x)e ( ' J - k ' ,<t ‘dxd<f), 

where the functions fjj- : R —► R introduced by Leonhardt, Paul and D’Ariano (1995) are called 
the "pattern functions". A explicit form of f-j.k(') is given by its Fourier transform by Richter 
(2000): for all j > k 

(H) Ji,k{t) = = n{-i) j - k ^j^\t\t j - k e-TL 3 k ~ k (^), 


where Lf, ( x ) denotes the generalized Laguerre polynomial of degree k and order a. Note that by 
Writing t = ||tu|| = ||(g,p)|| = \Jq 2 + p 2 in the equation (7), we can define 


( 12 ) 


Ij.kit) := | W jtk {q,p)\ = 


2^ 


p- fc e - 4 


Li~\2t 2 ) 


Therefore, there exists an useful relation, for all j > k 


(13) 


fj,k(t) = TT 2 \t\lj, k (t/2). 


Moreover Aubry, Butucea and Meziani (2009) have given the following Lemma which will be useful 
to prove our main results. 


Lemma 1 (Aubry, Butucea and Meziani (2009)). 
For all j,k £ N and J := j + k + 1, for all t > 0, 


( 14 ) 




1 

7T 


1 if o < t < Vj, 

e -(t-vT) 2 if t>s[j. 
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2. Statistical model 


In practice, when one performes a QHT measurement (see Figure 2), a number of photons fails 
to be detected. These losses may be quantified by one single coefficient 77 £ [0,1], such that 
ry = 0 when there is no detection and 77 = 1 corresponds to the ideal case (no loss). The quantity 
(1 — 77 ) represents the proportion of photons which are not detected due to various losses in the 
measurement process. The parameter 77 is supposed to be known, as physicists argue, that their 
machines actually have high detection efficiency, around 0.9 = 77. In this paper we consider 77 £ 
] 1/2,1]. Moreover, as the detection process is inefficient, an independent gaussian noise interferes 
additively with the ideal data X Note that the gaussian nature of the noise is imposed by the 
gaussian nature of the vacuum state which interferes additively (see figure 3). 

To resume, for $ = <j), the effective result of the QHT measurement is for a known efficiency 

77 e]i/ 2 , l], 

(15) y = Vl + \/(l — i?)/2 £ 


where £ is a standard Gaussian random variable, independent of the random variable X^ having 
density, with respect to the Lebesgue measure on R x [0, 7 r], equal to p p (-, •) defined in equation ( 8 ). 
For the sake of simplicity, we re-parametrize (15) as follow 

(16) Z := Y/y/rj = X 4> + y/l - rf)/{2rf) £ := 

where 7 = (1 — rj)/{ 4 rj) is known and 7 £ [0, l/4[ as 77 e]l/2,1]. Note that 7 = 0 corresponds to 
the ideal case. 

Let us denote by p 7 (-, •) the density of (Z, 4>) which is the convolution of the density of X$ with 
IV 7 (-) the density of a centered Gaussian distribution having variance 27 , that is 


(17) 






* AT (2) 


Pp {-,</>)* IV 7 (z) 




For $ = </>, a useful equation in the Fourier domain, deduced by the previous relation (17) and 
equation (4) is 

(18) .Fi[p 7 (-,</)](£) = Fi [p p (-,()>)](f)]V 7 (t) = W p (t cos((j>), t sm((f>)) N 1 (t), 

where T\ denotes the Fourier transform with respect to the first variable and the Fourier transform 
of JV 7 (-) is lV 7 (f) = e -7 * 2 . 


This paper aims at reconstructing the Wigner function W p of a monochromatic light in a cavity 
prepared in state p from n observations. As we cannot measure precisely the quantum state in 
a single experiment, we perform measurements on n independent identically prepared quantum 
systems. The measurement carried out on each of the n systems in state p is done by QHT 
as described in Section 1. In practice, the results of such experiments would be n independent 
identically distributed random variables (Z\, 4>i),..., (Z n , $„) such that 

(19) Z t :=Xt + y/*i&. 

with values in lx [0, 7 r] and distribution P 7 having density with respect to the Lebesgue measure 
on R x [0, 7 r] equal to p 7 (-, •) defined in (17). For all t = 1,..., n, the £/s are independent standard 
Gaussian random variables, independent of all (X(. <!>(). 

In order to study the theoretical performance of our different procedures, we use the fact that 
the unknown Wigner function belong to the class of very smooth functions A(/3, r, L ) (similar to 
those of Butucea, Gu^a and Artiles (2007); Aubry, Butucea and Meziani (2009)) described via its 
Fourier transform: 
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where /(•, •) denotes the Fourier transform with respect to both variables and ||(u, u)|| = \]u 2 + v 2 
denote the usual Euclidean scalar norm. Note that this class is reasonable from a physical point of 
view as the class realistic lZ(C,B,r) of density matrix defined in (2) has been translated in terms 
of Wigner functions by Aubry, Butucea and Meziani (2009). They prove that the fast decay of the 
elements of the density matrix implies both rapid decay of the Wigner function and of its Fourier 
transform. 

Outline of the results. The problem of reconstructing the quantum state of a light beam has 
been extensively studied in physical literature and in quantum statistics. We mention only papers 
with theoretical analysis of the performance of their estimation procedure. Many other physical 
papers references can be found therein. Methods for reconstructing a quantum state are based on 
the estimation of either the density matrix p or the Wigner function W p . In order to compute the 
performance of a procedure, a realistic class of quantum states 1Z(C, B , r) has defined in many 
papers as in (2) in which the elements of the density matrix decrease rapidly. From the physical 
point of view, all the states which have been produced in the laboratory up to date belong to such 
a class with r = 2, and a more detailed argument can be found in the paper of Butucea, GutA and 
Artiles (2007). 

The estimation of the density matrix from averages of data has been considered in the framework 
of ideal detection (77 = 1 i.e. 7 = 0 ) by Artiles, Gill and Gu(a (2005) while the noisy setting as 
investigated by Aubry, Butucea and Meziani (2009) for the Frobenius - norm risk. More recently in 
the noisy setting, an adaptive estimation procedure over the classes of quantum states 1Z(C, B,r), 
i. e. without assuming the knowledge of the regularity parameters, has been proposed by Alquier, 
Meziani and Peyre (2013) and an upper bound for Frobenius - norm risk has been given. The 
problem of goodness-of-fit testing in quantum statistics has been considered in Meziani (2008). In 
this noisy setting, the latter paper derived a testing procedure from a projection-type estimator 
where the projection is done in L 2 distance on some suitably chosen pattern functions. 

Note that we may capture some features of the quantum states more easily on the Wigner function 
W p , for instance when this function has significant negative parts, the fact that the quantum state 
is non classical. Aubry, Butucea and Meziani (2009) translate the class 1Z(C, B , r) in terms of 
rapid decay of the Fourier transform of its associated Wigner functions as defined in (20) by the 
class A(/3,r,L). Over this class with r = 1 and for the problem of pointwise estimation of the 
Wigner function, when no noise is present, we mention the work of Guf;a and Artiles (2007). They 
propose a kernel estimator and derive sharp minimax results over this class. 

This paper deals with the problem of reconstruction the Wigner function W p in the context of 
QHT when taking into account the detection losses occurring in the measurement, leading to an 
additional Gaussian noise in the measurement data (77 g] 1/2, 1]). The same problem in the noisy 
setting was treated by Butucea, Gu(a and Artiles (2007), they obtain minimax rates for the point- 
wise risk over the class A{/3, r, L) for the procedure defined in (21). Moreover, a truncated version 
of their estimator is proposed by Aubry, Butucea and Meziani (2009) where a upper bounds is 
computed for the L 2 risk over the class A(/3, r, L). The estimation of a quadratic functional of the 
Wigner function, as an estimator of the purity, was explored in Meziani (2007). 

The remainder of the article is organized as follows. In Section 3, we establish in Theorem 1 the 
first sup-norm risk upper bound for the estimation procedure (21) of the Wigner function while in 
Theorem 2 we establish the first minimax lower bounds for the estimation of the Wigner function 
for the quadratic and the sup-norm risks. These results match our sup-norm upper bounds results 
up to a logarithmic factor in the sample size n. 

We propose in Section 4 a Lepski-type procedure that adapts to the unknown smoothness parame¬ 
ters /3 > 0 and r G]0,2] of the Wigner function of interest. The only previous result on adaptation 
is due to Butucea, Gu(a and Artiles (2007) but concerns the simplest case r €=]0,1 [ where the 
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estimation procedure (21) with a proper choice of the parameter h independent of (3, r is naturally 
minimax adaptive up to a logarithmic factor in the sample size n. Theoretical investigations are 
complemented by numerical experiments reported in Section 5. The proofs of the main results are 
defered to the Appendix. 

3. WlGNER FUNCTION ESTIMATION AND MINIMAX RISK 

From now, we work in the practice framework and we assume that n independent identically 
distributed random pairs (Zj, < F*)i=i,...,n are observed, where is uniformly distributed in [ 0 , 7 r] 
and the joint density of (Z i; T,) is pj(-, •) (see (17)). As Butucea, Gu^a and Artiles (2007), we use 
the modified the usual tomography kernel in order to take into account the additive noise on the 
observations and construct a kernel K? which performs both deconvolution and inverse Radon 
transform on our data, asymptotically such that our estimation procedure is 


— 1 " 


where 0 < 7 < 1/4 is a fixed parameter h > 0 tends to 0 when n -> 00 in a proper way to be 
chosen later. The kernel is defined by 

(22) Kl(t) = \t\e^l mi/h , 

where z = (q,p) and [+</>] = q cos (f> + p sin </>. 

From now, || • Hoc and || • H 2 and || • ||i will denote respectively the sup-norm, the L 2 - norm and the 
Li- norm. As the sup-norm risk can be trivially bounded as follow 

(23) \\WZ - IFpHoo < ||w£ - nWj )Hoc + ||e \WZ] -W p \\ , 


and in order to study the sup-norm risk of our procedure IF/’, we study in Proposition 1 and 2, 
respectively the bias term and the stochastic term. 

Proposition 1 . Let W 7 be the estimator ofW p defined in (21) and h > 0 tends to 0 when n —> 00 
. Then, 


n M M TC < y (27r) 2 .3/' U " C 

where W p £ A(fi,r,L ) defined in (20) and r €]0,2]. 

The proof is defered to Appendix A.l. 

Proposition 2. Let IF/ be the estimator ofW p defined in (21) and 0 < h < 1. Then, there exists 
a constant C\, depending only on 7 such that 


h ( r - 2 )/ 2 e -fih ”(1 + 0(1)), 


iw -EMHoo] < c ie ^- 2 (ji+iy 


Moreover, for any x > 0, we have with probability at least 1 — e x that 


1 + x 1 + x 



I WZ - E^lloo < C 2 e^ h ~ 2 max 


where C 2 > 0 depends only on 7. 

The proof is defered to Appendix A.2. The following Theorem establishes the upper bound of the 
sup-norm risk. 

Theorem 1. Assume that W p belongs to the class A(/3,r,L) defined in (20) for some r G]0, 2] 
and fi,L > 0. Consider the estimator (21) with h* = h*(r) such that 

f Wfi + (Tup = \ lo s( n ) */ 0 < r < 2 > 

^ I ^* = (^) 1/2 if r = 2 . 


r = 2. 
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Then we have 

E [||- WJoo] <Cv n (r), 

where C > 0 can depend only on 7 , /?, r, L and the rate of convergence v n is such that 

( ft *)(r- 2)/2 e -P( h *r r if 0 < r < 2, 

n 2(/3 + 7) 2 / r = 2. 


(27) 


I’n(r) = 


Note that for r €]0,2) the rate of convergence v n is faster than any logarithmic rate in the sample 
size but slower than any polynomial rate. For r = 2, the rate of convergence is polynomial in the 
sample size. 

Proof of Theorem 1: Taking the expectation in (23) and using Propositions 1 and 2, we get for 
all 0 < h < 1 


E 


WZ-w f 


plloo 


< E 


\\W^-E[W^]\\ C 


l|E 




- w, 


plloo 


< 


Ce lh ~ 2 + o(l)) + C B h^- 2 ^ 2 e~ 0h "(1 + o(l)) 
V n 


where Cb — y ( 2 n) 2 f 3 r ’ ^ ► 0 as n —)► oo and W p G A(/3, r, L). The optimal bandwidth parameter 
h*(r) := h* is such that 


(28) 


h* 


arg inf 
h>0 


Csh^-^^e-^ r +Ce lh ~ 



Therefore, by taking derivative, we get 

Wr + Wy = l log(n) + C(1 + o(1)) - 

By plugging the result in (28) for 0 < r < 2 we have 

(/i*)( r_ 2 )/ 2 e -/ 3 (k*) -r = 2 )/ 2 

\/n 

It comes that the bias term is much larger than the stochastic term for 0 < r < 2. It is easy to 
see that for r = 2 , we have h* = and that the the bias term and the stochastic term 

are of the same order. □ 


We derive now a minimax lower bound. We consider specifically the case r = 2 since it is relevant 
with quantum physic applications, but our results can easily be generalized to the case r €] 0 , 2 ]. 
However, similar arguments can be applied to the case 0 < r < 2. The only known lower bound 
result for the estimation of a Wigner function is due to Butucea, Gu(a and Artiles (2007) and 
concerns the pointwise risk. In Theorem 2 below, we obtain the first minimax lower bounds for 
the estimation of a Wigner function W p £ A(/3, 2, L) with the quadratic and sup-norm risks. 


Theorem 2. Assume that • • • ,(Z n ,<f> n ) coming from the model (16) with 7 £ [0, l/4[. 

Then, for any (S, L > 0 and r = 2 there exists a constant c := c(/3, L,"f) >0 such that for n large 
enough 


inf sup E||W n -W p ||p > 
w„ W p eA(/3,2,L) 


cn 2 <.P+7) log 3 ^ 2 (n) 
_ 

cn l3 +~ i 


if p = 00 , 
if p= 2. 


where the infimum is taken over all possible estimators W n based on the i.i.d. sample {( Z < f ) j)}" =1 . 

The proof is defered to Appendix B. This theorem guarantees that the sup-norm upper bound 
derived in Theorem 1 and the quadratic risk upper bound in the paper of Aubry, Butucea and 
Meziani (2009) are minimax optimal up to a logarithmic factor in the sample size. We believe 
that the logarithmic factors for both cases are artefact of the proofs. 
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4. Adaptation to the smoothness 

As we see in (28), the optimal choice of the bandwidth h* depends on the unknown smoothness /3. 
For any 0 < r < 2, we propose here to implement a Lepski type procedure to select an adaptive 
bandwidth h. We will show that the estimator obtained with this bandwidth achieves the opti¬ 
mal minimax rate up to a logarithmic factor. Our adaptive procedure is implemented in Section 5. 


Let M > 2, and 0 < Hm < ■ ■ ■ < hi < 1 a grid of ]0,1[, we build estimators W(( associated to 


bandwidth h m for any 1 < m < M. For any fixed x > 0, let us define r n (x ) = max 
We denote by C K (-), the Lepski functional such that 


1+aA 

n )' 


C K {m) = maxj >m - W^.\\oo - 2ne lh i r n (x + logM)} 

(29) +2Ke yhm r n (x + logM), 

where n > 0 is a fixed constant. Therefore, our final adaptive estimator denoted by IT/ 1 / will be 
the estimator defined in (21) for the bandwidth h The bandwidth h ^ is such that 

(30) m = argmin 1 < m < M £ ft (m). 

Theorem 3. Assume that W p £ A(/3,r, L). Take k > 0 sufficiently large and M > 2. Choose 
0 < Hm <■■■< hi < 1. Then, for the bandwidth h a with m defined in (30) and for any x > 0, 
we have with probability at least 1 — e~ x 

(31) ||- WpWoo < min M {/^V^ + r n (x + logM)} , 


where C > 0 is a constant depending only on 7 ,ffir,L. 
In addition, we have in expectation 


(32) E IIW^-W, 


< C' min ■ 

1 <m<M 


'' / 2 _ 1 e _7 T 


o'y^'rr 


i (log M)|, 


where C' > 0 is a constant depending only on 7 , r, ffi L. 


The proof is defered to the Appendix C. 


The idea is now to build a sufficiently fine grid 0 < Hm < ■ ■ ■ < hi < 1 to achieve the optimal rate 
of convergence simultaneously over r £ (0,2] and /? > 0. Take M = [i/log 71 /( 27 )]. We consider 
the following grid for the bandwitdh parameter h: 

(33) hi = 1/2, h m = ^ (\ - (to - > 1 <m< M. 

We build the corresponding estimators W)( and we apply the Lepski procedure (29)-(30) to obtain 
the estimator IT// . The next result guarantees that this estimator is minimax adaptive over the 
class 

n := {{/3, r, L), /3 > 0, 0 < r < 2, L > 0} . 


Corollary 1. Let the conditions Theorem 3 be satisfied. Then the estimator WJJ/ for the bandwidth 
hfj, with m defined in (30) and for any ( /3,r,L ) £ 12 satisfies 


linisup^^ 


sup E 

W p dA(0,r,L) 


\ W L 


Wplloo 


< Cv n (r), 


where v n (r) is the rate defined in (27) and C is a positive constant depending only on r, L, f3 and 

7- 


Proof of Corollary 1 : First note that for all to = 1, • • • , M and as 

h m £]( 7 /( 2 logn)) 1 / 2 , 1 / 2 ], 



MINIMAX AND ADAPTIVE ESTIMATION OF DTHE WIGNER FUNCTION IN QHT 


11 


the bias term h r J 2 ~ 1 e h ™ is larger than the stochastic term e lh ™ r n (\og M) up to a numerical 
constant. Let define 

m := arg max {| h m — h*\ : h m < h*}, 

l<m<M 

where to is well defined as 

h M _ (i/ 2 ) (l - M( 27 /logn ) 1/2 + ( 27 /logn) 1/2 ) 
h * (log 71 /( 27 ) — {/3/'y)(h*)~ r )~ 1 ^ 2 

= \ (l - M + ((log?r)/( 27 )) 1/2 ^ (1 - (2/3/(log(n))(ft*) 

Moreover, as 0 < ((log?r)/( 27 )) 1 ^“ — M < 1 we get 


—r\l/2 


urr — (l “ (2/3/(log(n))(/i *) _r ) 1/2 


< 1 . 


Therefore, from (32), 


E 


WL~W P \\oo\ < Ch r P~ X e ^<Ch r P~ X e ^v n {r)v n {r)- 1 

r/2-l 


= C e-^-^-n Vn{r) . 


By the definition of to, it comes that /i~ r > ( h*) r , then 


E 


WL-w, 


plloo 


< c 


r/2-1 


>(r)=C 


hff, — h* 


+ 1 


r/ 2—1 


.(O' 


By construction | hfh — ft.* | < ( 7/(2 log n)) 1 / 2 , then we have 


E 


II W7-W, 


Plloo 


< C l- 


( 7/(2 log n)) 1/2 


72-1 


i(r)- 


As (h*) 1 < (logn/( 27 )) 1 ^ 2 , it holds 1 — (^/l 2 '"S’ 1 )) — > 1 / 2 . Therefore as r/2 — 1 < 0, the result 
follow 


E 


WW^-W^l < Cv n (r). 

5. Experimental evaluation 


□ 


We test our method on two examples of Wigner functions, corresponding to the single-photon 
and the Schrodinger’s cat states, and that are respectively defined as 

W p (q,p) = —(1 — 2{q 2 + p 2 ))e~ q2 ~ p2 , 

W p (q, p) = i e - (9 - <?o)2 - p2 + ^e-( q+q ° )2 - p2 +cos{2q oP )e- q2 - p2 . 

We used q 0 = 3 in our numerical tests. The toolbox to reproduce the numerical results of this 
article is available online 1 . Following the paper of Butucea, Gu(a and Artiles (2007) and in order to 
obtain a fast numerical procedure, we implemented the estimator IT/ defined in ( 21 ) on a regular 
grid. More precisely, 2-D functions such as W p are discretized on a fine 2-D grid of 256 x 256 
points. We use the Fast Slant Stack Radon transform of Averbuch et al. (2008), which is both fast 
and faithful to the continuous Radon transform 1Z. It also implements a fast pseudo-inverse which 
accounts for the filtered back projection formula (21). The filtering against the 1-D kernel (22) is 
computed along the radial rays in the Radon domain using Fast Fourier transforms. We computed 
the Lepski functional (29) using the values x = log (M) and k = 1. 


1 https://github.com/gpeyre/2015-AOS-Adaptivetfigner 
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2 2.5 3 


II - Wplloo/H Wp||oo as a function of h 


0.25 



1.5 2 2.5 3 

Histogram of the repartition of 




W p (3-D display) 



Wl__ (2-D display) Wf___ (3-D display) 


FIGURE 4. Single photon cat state estimation, with r/ = 0.9, n = 100 x 10 3 . Left, 
top: display of \\W^ — Wp||oo/||Wp||oo as a function of 1/h. The central curve is the 
mean of this quantity, while the shaded area displays the ±2x standard deviation of 
this quantity. Left, right: histogram of the empirical repartition of m computed by the 
Lepski procedure (30). Center: display as a 2-D image using level sets of W p (top) and 
WJJL (bottom). Right: same, but displayed as an elevation surface. 



0.3 



1.2 1.4 1.6 1.8 2 

Histogram of the repartition of hf ^ 



W p (2-D display) W p (3-D display) 




(2-D display) (3-D display) 


FIGURE 5. Schrodinger’s cat state estimation, with p = 0.9, n = 500 x 10 3 . We refer 
to Figure 4 for the description of the plots. 
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Figures 4 and 5 reports the numerical results of our method on both test cases. The left part 
compares the error || — ITpHoo (displayed as a function of h) to the parameters h ^ selected 

by the Lepski procedure (30) . The error \\W7 — WpHoo (its empirical mean and its standard 
deviation) is computed in an “oracle” manner (since for these examples, the Wigner function to 
estimate W p is known) using 20 realizations of the sampling for each tested value The 

histogram of values is computed by solving (29) for 20 realizations of the sampling. This 
comparison shows, on both test cases, that the method is able to select a parameter value h ^ 
which lies around the optimal parameter value (as indicated by the minimum of the L°° error). 
The central and right parts show graphical displays of , where m is selected using the Lepski 
procedure (30), for a given sampling realization. 


Appendix A. Proof of Propositions 


A.l. Proof of Proposition 1. First remark that by the Fourier transform formula for w = 
(q,p) G R 2 and x = (£ 1 , 2 : 2 ) 


(34) 


W p (w) = Jf W p (x)e~ i ^ qxl+pX2 ^dx. 


Let W7 be the estimator of W p defined in (21), then 


E 


1 


W h( w ) = xt E [ A 7(K $ i] - z i)} = 7T / / - z)p y p(z,(t>)dzd<t> 


2 h 
1 r 
2tt Jo 


27r 


K h *P2('>0)(K <£])#■ 


In the fourier domain, the convolution becomes a product, combining with (18), we obtain 

r 1 


E 


Wj(w) 






As A 7 (f) = e 7 * , the definition (22) of the kernel combining with (18) gives 

r 1 


E 


WZ(w) 


W 

1 


Kh(t)W p (t cos(</>), t sin(0))A 7 (t)e lt ^ w dtd(j> 
1 1 1 W p (t cos (</>), t sin (0)) e~ lt ^ ^ dtdcj). 


'\t\<i/h 


Therefore, by the change of variable x = (tcos(</>), tsin(^)), it comes 

1 r 


(35) 


E 




(2tt)2 


|a:||2<l/Ii 


W p {x)e- l< ' qxl+pX2 ' l dx. 


From equations (34) and (35), we have 

1 r 


E 


W?(w) — W p (w) 


< 


< 


< 


(2-i2 


n ) J\\x\\ 2 >l/h 

1 


W p (x) 


dx 


(27 


W p {x) 


‘ em*\\ r 2 /2 dx 


1 1/2 


e-m^dx 


\x\\ 2 >l/h 


1/2 


' L 
(2t ) 2 (3r 


h ( r -2)/2 e -P h r ( 1 + 0 ( 1 )) j h-> 0 


as W p G A{fi,r,L) the class defined in (20). 
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A.2. Proof of Proposition 2. The following Lemma is needed to prove the Proposition 2. 
Lemma 2. Let Sh := hr 1 e ^ 2 >0 for any 0 < h < 1, then the class 

(36) H h = h > 0 

is uniformly bounded by U := Moreover, for every 0 < e < A and for finite positive constants 
A, v depending only on 7 , 

(37) 


sup N{e,H h ,L 2 {Q)) < {A/e) 1 
Q 


where the supremum extends over all probability measures Q on R. 

The proof of this Lemma can be found in D.l. To prove (24), we have to bound the following 
quantity : 

"i 2 


E^M^Y/v/^l 2 ] < \\K\ 


x,<ira? = 




|f|e 7t dt. 


= [ y-V ' 1 " 2 --) < 


7 2 


(38) = 2 / fe 7 * dt 

Jo 

Moreover for Sh = h~ 1 e lh 2 , we have 

(39) ^ 2 E[|^(M]-y/07)| 2 ] < 

By Lemma 2, it comes that the class Hh is VC. Hence, we can apply (57) in the paper of Gine 
and Nickl (2009) to get 


E 


w:-nw//]\\oo 


= E sup 

z6R 2 


27t n 


E K h ([A 4 n\ -Z t )~ E [K/([z, </> e \ - Z t )} 


1 = 1 


= o- E SU P 

ZTrn 26K 2 


E ([a M - Zt) - E [6^Kl([z, <j> e \ - Zt)] 


(40) 


< 


Ch)6 h 

2irn 


AU 


AU 


a \ n log- 1 - U log- 


where U = 277 the enve l°P of the class 77/, defined in Lemma 2. By choosing 


a 2 := - > sup E 

7 z6R 2 

in (40) we get the result in expectation (24). 


(spic h ([zM - 7 =))' 


To prove the result in probability (25), we use Talagrand’s inequality as in Theorem 2.3 of Bousquet 
(2002). Let us define 


Z: =Wh K ~ E[ ^ 


rallloo- 


In view of the previous display (38), we have 


Var 7 (h6 h y 




k ^-h\ 


< J 2 (h6h)- 2 E 




< l 2 {hSh)- 2 -e^ h ~ 2 = 1. 


As U = 577 and by (D.l), it comes 

7 (M„)- 1 ||if 7 (.-^)^E[A 7 (.-^)] 


<7(/i4)- 1 ||7f 7 || 00 1. 
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Then, for any x > 0 and with probability at least 1 — e we obtain 

Z < E [Z\ + \j2xn + 4a;E[Z] + — < E [Z] + \[2xn + 2 \J rrE[Z] + — 

o o 

. - 4-177 

< 2E [Z] + \/2xn + —, 
o 


where we have used the decoupling inequality 2 ab < a 2 + b 2 with a = yfx and b = y/E[Z], Thus, 
with probability at least 1 — e~ x , we get 


nwy - w 


77-7 


< 2E 


\\WZ-E[WZ]\\ C 



Plugging our control (24) on E[||1T^ — ] lloo], the result in probability follows. 


Appendix B. Proof of Theorem 2 - Lower bounds 

B.l. Proof of Theorem 2 - Lower bounds for the L 2 -norm. The proof for the minimax 
lower bounds follows a standard scheme for deconvolution problem as in the paper of Butucea, 
Gu^a and Artiles (2007); Lounici, K. and Nickl (2001). However, additional technicalities arise 
to build a proper set of Wigner functions and then to derive a lower bound. From now on, for 
the sake of brevity, we will denote A(/3,2,L) by A(f3,L) as we consider the practice case r = 2. 
Let Wo G A((3, L) be a Wigner function. Its associated density function will be denoted by 
Po{x,(t>) = ^[^o](z, </>)![0,^(0)• 


Let M = [-s/log n J be the integer part of log n, and 
(41) d := log -1 (n). 


We suggest the construction of a family of M Wigner functions such that for all m = 1, • • • , M 
and w G R 2 : 

W m>h (w) = W 0 {w) + V mth (w), l<m<M, 

depending on a parameter h = h(n) —> 0 as n —» oo. The construction of Wo and V m ^h are 
discussed in Appendix B.1.1 and B.l.2. We denote by 

Pm,h{ X A) = -'^[W / m,/l](a:,<?^')l[o,77](<( , ) 

7r 

the associated density function of the Wigner function W m} h- As we consider the noisy framework 
(16) and in view of (17), we set for all 1 < m < M 

Pm,h( z ’<l > ) = \Pm,h{-A) * N 7 ] (z) and pl{z,4>) = [po(-,</0 * ^ 7 ] (z). 

If the following conditions (Cl) to (C3) are satisfied, then Theorem 2.6 in the book of Tsybakov 
(2009) gives the lower bound. 

(Cl) For all m = 1 • • • M, W mM G A(/3, L ). 

(C2) For any 1 < k ^ m < M , we have for — Wm^Wi > 4 ip^, with p 2 n = 0 

(C3) For all 1 < m < M, 


nX iPl.hWl) : = 


{p\ h {zA)~pl{zA))\ 

--tTT T\ - dzd( P < 


" loJ PlizA) 

Proofs of this three conditions are done in Appendix B.l.3 to B.1.5. 


M 

T‘ 


B.1.1. Construction of Wo- The Wigner function Wo is the same as in the paper of Butucea, 
Gu^a and Artiles (2007). For the sake of completeness, we recall its construction here. The 
probability density function associated to any density matrix p in the ideal noiseless setting is 
given by equation (9). In particular, for diagonal density matrix p, the associated probability 
density function is 

OO 

P p {x,4>) =^2pkktpk( x )- 

k=0 
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For all 0 < a, A < 1, we introduce a family of diagonal density matrix p a,x such that for all k £ N 


(42) 


o:,A _ 

Pkk 


(i~ z y 


\<z<l 


a - - 


dz. 


(\-\y 

Therefore the probability density associated to this diagonal density matrix p a,x can be written 
as follow 

(43) p a ,x(x,<j)) = ^2p kk il>l{x) = y'V’feW / z k ay - ~-r^l\<z<idz. 

k =o k—0 Jo A) 

Moreover by the well known Mehler formula (see Erdelyi , Magnus, Oberhettinger and Tricomi 
(1953)), we have 


z k ipk( x ) = / — e x P (—■x 2 -, - 

t'o - z 2 ) V 1 + z) 


Then, it comes 


Pa, \{x,4>) = 


« r 1 (1 -z) a 
(1 - A)“ Jo yj 7 r(l -z 2 ) 


exp —x' 


,1-z 

r Tz 


L A<z<l 


dz. 


The following Lemma, proved in the paper of Butucea, Gu^a and Artiles (2007), gives a control 
on the tails of the associated density p a ,x(x, <j>) = p a ,\(x) as it doesn’t depend on </>. 

Lemma 3 (Butucea, Guta and Artiles (2007)). For all £ [0, 1] and all 0 < a, A < 1 and \x\ > 1 
there exist constants c, C depending on a and A such that 

c | a ,|-( 1 +2a) < p a X ( x ) < C\x\~ (l+2a) . 


In view of Lemma 3 of Butucea, Gu^a and Artiles (2007), the Wigner function Wq will be chosen 
in the set 

ypa.A _ _ W PaA : Wigner function associated to p a ,\ : 0 < a, A < l} , 

with A close enough to 1 so that Wo € A(0, L) (see Butucea, Gu^a and Artiles (2007) for the proof 
and details). 


B.1.2. Construction of the set of Wigner functions Ws,h for the L 2 -norm. 

We define M + 1 infinitely differentiable functions such that: 

• For all m = 1 • • • , M, g m : R —► [0,1]. 

• The support of g m is Supp(g m ) = ]mS, (m + 1)<5[. 

• And Vt £ [(m + 1/3)5, (to + 2/3)5], g m (t) = 1. 

• An odd function g : R —> [—1,1], such that for some fixed e > 0, g(x) = 1 for any x > e. 
Define also the following constants : 

(44) a m := (h~ 2 + md) 1 ^ 2 , b m := {h~ 2 + (to- + 1)5) 1//2 , Vm = l,---,M. 

(45) d m := (h~ 2 + (to + l/3)5) 1 ^ 2 , b m := (h~ 2 + (m + 2/3)5) 1 / 2 Vm = l,---,M. 

(46) C 0 := \ArL(/3 + 7 ). 

We also introduce M infinitely differentiable functions such that: 

• For all to = 1 • • • , M, V rn ,h : R 2 —t R is an odd real-valued function. 

• Set t = \Jwf + w\, then the function V m ,h admitting Fourier transform with respect to 
both variable equals to 

(47) V m ,h{w) ■= [V m ,h]{w) := iaCoh~ 1 e l3h 2 e _2/3|t| ' g m (\t,\ 2 - h~ 2 )g(w 2 ), 

where a > 0 is a numerical constant chosen sufficiently small. The bandwidth is such that 

( log n y 1/2 
U(/3 + 7)/ 


(48) 


h = 
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Note that V m ^{w) is infinitely differentiable and compactly supported, thus it belongs to the 
Schwartz class <S(R 2 ) of fast decreasing functions on R 2 . The Fourier transform being a continuous 
mapping of the Schwartz class onto itself, this implies that V mt h is also in the Schwartz class <S(R 2 ). 
Moreover, V m ^(w ) is an odd function with purely imaginary values. Consequently, V m ^ is an odd 
real-valued function. Consequently, we get 

(49) JJ V m , h {p , q)dpdq = J U[V m ^](x, <f>)dx = 0 , 

for all 4> £ [0, 7 r] and 'R-\y m ^h} the Radon transform of V m> h- As in ( 8 ), we define 

P m ,h( x ,4>) = -fc[Wm,h]{x,(j>) l(0,7r(<A), 

7r 

(50) and p { ™' h) = J J p m , h ( x , <j>)fj,k{ x ) eU ~ k),l> dxd(j>. 

By Lemma 6 in Appendix D.4, the matrix p( m ’ h ) is proved to be a density matrix. Therefore, 
in view of (9) and (49), the function W m> h is a Wigner function. Now, we can define our set of 
Wigner functions 

(51) W 5>h = {W m , h : R 2 ->• R, W mih (z) = W 0 (z) + V m , h (z), m = 1 , • • • ,Af} , 

where Wo is the Wigner function associated to the density po defined in (42). 


B.1.3. Condition (Cl). By the triangle inequality and for any 1 < to < M, we have 
||W^TO,/»e^^ S || 2 < ||WoeM 2 || 2 + ||Kn,/ l ell'H 2 ||2. 

The first term in the above sum has be bounded in Lemma 3 of Butucea, GutA and Artiles (2007) 
as follow 


(52) 


|| Woe 11 ' 


|| < tt 2 A. 


To study the second term in the sum above, we consider the change of variables w = (t cos <j), t sin <j>) 
and as g is bounded by 1, we get since (41), (44) and (46) that 


II Vm,h€‘ 


m 


la < 


aCr ) h~ 1 e 0h ~ 


‘e- 2 ^ 2 g 2 m (\\w\\ 2 -h- 2 )dw 


< a 2 C 2 h~ 2 e 20h ~ 


\t\e~ 2m2 dt 


/ 0 j dry 


< 7 T a 2 C 2 h~ 2 e 20h e~ 20a 


tdt < ^ a 2 C$h 2 e 20mS [b 2 m - a 2 m 


< la 2 C 2 h- 2 5e~ 20m5 < n 2 L, 


(53) .Zh '-'O' 

for a small enough. Combining (52) and (53), it comes l'T TO ,/i €E A(/3,L) for any 1 < m < M. 

B.1.4. Condition (C2). By applying Plancherel Theorem and the change of variables w = 
(t cos <j>,t sin^), we have since the supports Supp(gfc) and Supp(<? m ) are disjoints for any k 7 ^ to. 
that 


||W M - W m ,J 2 = \\v kth -v m>h \\l = 


1 

47T 2 


V k ,h{t,(j)) - V mih (t,(j>) 


dtd<f> 


(54) 


a °0 i -2 2 / 3 h ~ 2 

4t r 2 6 


JJ W g 2 (t sin <j>) [< gl(t 2 - h 2 ) + g 2 m (t 2 - h 2 )] dtd<j>. 


Note that for a fixed p €]0,7 t/4 [, there exists a numerical constant c > 0 such that sin(0) > c on 
]p, 7 r — p[. From now, we denote by A m the set 


(55) 


A m := {» e R 2 : (to. + 1/3)5 < ||ro || 2 < (to + 2/3)5} , Vto =!,••• , Ad. 
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By definition of g and for a large enough n, we have for any (t, <j >) € (A k U A rn ) x]/i. -k — /i[ that 
g 2 (tsin((f>)) = 1 with t 2 = ||w|| 2 . Therefore, (54) can be lower bounded as follows 

\\W klh -W m , h \\l > —r-§-h~ 2 e 2 ^ h 2 f f _ \t\e ~ 4 ^ 2 [gl(t 2 - h~ 2 ) + g 2 m (t 2 - hr 2 )] dtdcj) 

Ju JA k UA m 


(56) 


74 n 2(-,2 h -2 20h~ 2 

C 0 ti e 


lA k UA n 


I t\e 


- 4 / 3* 2 \„?(d 


[g 2 (t 2 -h- 2 )+g 2 m (t 2 -h- 2 )]dt. 


On A rn and by construction of the function g rn , we have 

9m(t 2 - h~ 2 ) = 1, 1 < m < M. 

Constants defined in (45) are such that for k > to, we have d m < b m < dk < b k ■ Whence, since 
A m and A}- are disjoint sets for any k > to, it results 


I := 


lA k UA„ 


|e 4/3t ‘ [al{t 2 - h 2 ) + g 2 m (t 2 - h 2 )] dt 


> e" 4 ^ 


/ . \t\[ 9 l(t 2 -h- 2 ) + g 2 m (t 2 -h- 2 )]dt 

J A k UA m 

> 2e _4/3b " f tdt>e~ 4/5b2k (b 2 k -dl)t>^-Se~ 40b K 

Ja k 3 

Combining (56) and (57), we get since C'gft _2 i5 = ttL/2 


(57) 


\\W k , h -W m>h \\l > ^^a 2 C 2 h- 2 e 2 ^- 2 Se- 4 ^ = 

= 74 ~ a 2 Lc -20h~ 2 c -40(k+2/3)S. 

247r 

Since 1 < k < M < 1/8 and(48), it comes 


-4 Bbl 77 2/1 o 


24tt 


a-Le 2/3h e _4/3bfe 


IIWM-WWilli > 


7T- 2/1 2 


24tt 

where c > 0 is a numerical constant. 


4+ve 8/3 > 4cn 0+1 =: 4(/? 2 , 


B.1.5. Condition (C3). Denote by C > 0 a constant whose value may change from line to line 
and recall that N 7 is the density of the Gaussian distribution with zero mean and variance 2y. 
Note that po and N 7 do not depend on </>. Consequently, in the framework of noisy data defined 
in (16), Pq(z, 4>) = Po{z)^ l ( 0)7r )(</>). 

Lemma 4. There exists numerical constants d > 0 and c" > 0 such that 


(58) 

Po( z ) > c ' z 2 > 

VM > 1 + y/fry, 

and 



(59) 

IV 

V z < 1 + 1 / 27 . 


The proof of this Lemma is done in Appendix D.3. Using Lemma 4, the ^-divergence can be 
upper bounded as follow 



( 60 ) 
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First underline, as in (18) the Fourier transforms of pjL h and p q with respect to the first variable 
are equal respectively to 


(61) 

(62) 


Fi = W m< h (t cos cj), t sin </>)/V 7 (t) 

= (v mth (tcos<t>,tsm<j>) + W 0 (tcos(/>, tsincjjj e 
FilPoi', </>)](*) = Wo(t cos </>, t sin </>)e _7t2 , 


- 7 * 


since /V 7 (f) = e 7t2 . Using Plancherel Theorem and (47), equations (61) and (62), the first 
integral I\ in the sum (60) is bounded by 


h <11 {pl h { z A)-pZ{z,4>)) dzdcj)=- X — I I |.Fi[p 7 h (-,</>)] (t) -J"i [pi (■,<£)](*) 


1 

4-7T 2 


2 /~i2 

a L 0 j^—2^2/3h 2 


V m) h (t cos <j>, t sin <j >) 


,-27*' 


4-7T 2 

dtd(f> 


dtdcj) 


47T 2 


0 -4/3t 2 -2 7 t 2 2 02 


9m( t2 - h 2 ) g 2 (t sin cj))dtdcj). 


By construction, the function g is bounded by 1 and the function g m admits as support Supp(g m ) = 
]mS, (m + 1)<5[ for all m = 1, • • • , M. Thus, 


2 C. 


C 2 


h < I e-^ -^ gl (t 2 - h~ 2 ) dt < h~ 2 e 2 ^ h / e^ ^ dt 


4TT 


< ^l( bm -a m )h- 2 e^ h ~ 2 e- 4 ^- 2 ^ < ^ ^ ^ fe - 2 e 2 ^- 2 -^a 2 m -2 7 a 2 

47 T 47 t 2a m 

Some basic algebra, (41), (44), (46) and (48) yield 

,. 2 / 

(63) 


n a l C 

—,h — / I ’ 
c" V i°g ** 


for some a constant C > 0 whose may depend on /3, 7 , L and c". For the second term J 2 in 
the sum (60), with the same tools we obtain using in addition the spectral representation of the 
differential operator, that 


I 2 < 


JJ z 2 {fm,h( z ^) ~Po(z,<f>j) dzdcj> 
J j (y mth {t cos<t),tsva.(t>)e~ ir 


dtd(j> 


dtdcj) 


_ .2 (J ~ _ ,2 ~ 

e -7 — (V mt h)(t cos (f>,t sin (f>) — 2 r )/te~' y V m , h (t cos cj>,t sin cj>) 
(64) <2 II e~ 2lt |/ 2 ,i | 2 dtdcj) + I 67 2 f f t 2 e~ 2lt |/ 2 ,i | 2 dtdcj), 


dtdcj) 


where / 2>2 = V m ^h{t cos cf>, t sin cf>) and J 2> 1 , the partial derivative ^ (V mt h){t cos cf>,t sin cf>), is equal 
to 

iaCoh~ 1 e l3h ~ 2l3t [g m {t 2 — h~ 2 ) (— 4j3tg{t sin cj>) + g'{t sincj>) sincj) ) + 2tg' m {t 2 — h~ 2 )g(t sincj))] . 

Since g m and g belong to the Schwartz class, there exists a numerical constant cs > 0 such that 
max-tll^Hoo, Wg'JU ||g||oo, Halloo} < cs- Furthermore, for all m = 1, • • • , M, the support of the 
function g m is Supp(g m ) =\m8 , (m + l)h[, then 


-^ 2 , 1 1 2 < a 2 c%Clh 2 e 2 d h 4/3t ((4/1 + 2)\t\ + l ) 2 l (am ,b m) (0, 


(65) 
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with a m and b m defined in (44). Similary, we have 


|/ 2)2 | 2 = aC 0 h 1 e ph 'e 2 / 3 * 2 g m {t 2 - h 2 )g(tsm(j)) 

< a 2 4C 0 2 r 2 e^- 2 -^l (Oro , 6m) (i). 


( 66 ) 


Combining (65) and ( 66 ) with (64), as 0 < mS < 1 

rbm 


h < 2a 2 CgC 2 h~ 2 e 2 ^ h 


e -2 7 C e -4/3C 


0 J a n 


((4/3 + 2)|t| + l) z + 8 7 2 f 2 


< 2ira 2 4C 2 0 h- 2 e 2 P h ~ 2 e~W +2 ^ a 


((4/3 + 2 )b m + l ) 2 + 8 y“ 6 2 


dtd(j> 


dt 


< 27ra 2 4C' 0 2 /i- 2 e- 2(/3+7) ' 1 V 2 ( 2 / 3 + 7 )m <5 [((4^ + 2 ) 6 m + l ) 2 + 8 7 2 6 2 


b 2 -a 2 

u m U'm 
2,Cl ni 


< 2TTa 2 c%Clh- 2 e~ 2 ^ )h ~ 2 \((4/3 + 2)b m + l ) 2 + 8 7 2 6 2 


J 2 


Some basic algebra, (41), (44), (46) and (48) yield 

(67) —I 2 < a 2 C\Aogn, 

c 

for some a constant C > 0 whose may depend on /3, 7 , L cs and c'. Combining (67) and (63) with 
(60), we get for n large enough 

V 2, 7 7 \ rf iPl,h 4 A)-pl{z,(t ))) 2 ~ - 

nX (.Pkh’Po) :=n / / - 7 ?—- dzd(j> < a GVlogn, 

JqJr P o\ z -,9) 

where C > 0 is a constant whose may depend on /?, 7 , L cs, c” and d. Taking the numerical 
constant a > 0 small enough, we deduce from the previous display that 


M 


nX 2 {pl h ,pl) < 


since M = |&/log nj. 


B.2. Proof of Theorem 2 - Lower bounds for the sup-norm. To prove the lower bound for 
the sup-norm, we need to slightly modify the construction of the Wigner classe Ws,h defined in 
(51) into 

(68) Ws,h,e = {W m ,h, e : K 2 -> K, W mth>e (z) = W 0 (z ) + V mt h, e (z), m= 1, • • • ,M} , 

where W$ is the Wigner function associated to the density po defined in (42) stay unchanged as 
compared to the L 2 case. However, the construction of the {17 m ji}rn-functions defined in (47) 
only changed through modification of the functions g m and g respectively into g m ,e and g £ , for 
0 < e < 1 . 


We define M + 1 infinitely differentiable functions such that: 

• For all m = 1 • • • , M, g m : K. — > [0,1]. 

• The support of g m ^ is Supp(g mi£ ) = ]mS, (m + 1)5[. 

• Using a similar construction as for function g m , we can also assume that 

(69) ff m ,e(<) = 1, Vt € B m ^ := [(to + e)<5, (to + 1 - e)<5], 

and 

(70) Halloo < Y 5 ' 

for some numerical constant c > 0. 
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• An odd function g e : K. —> [—1,1] satisfies the same conditions as g above but we assume 
in addition that 

( 71 ) HWoo < 

e 

for some numerical constant c > 0. 

The condition (71) will be needed to check Condition (C3). Such a function can be easily con¬ 
structed. Consider for instance a function g e such that its derivative satisfies 

9e(t)= V’*^l(0,e) {t), 

for any t € (0, e) where ip is a modifier. Integrate this function and renormalize it properly so that 
g £ (t) = 1 for any t > e. Complete the function by symmetry to obtain an odd function defined on 
the whole real line. Such a construction satisfies condition (71). 


It is easy to see that Condition (Cl) is always satisfied by the new test functions {W mt h, £ } m - To 
check Condition (C2) set Ch = iaC^h~ 1 e^ h ~ and then we have 

W kth>e (z) - W mth>e (z) = ^ JJ (\V kthte (w)-W m>h>e (w)j dw 

= i J J e ~ zt[z,,l>] \t\ (Wfc )/()e (f costMsin^) 

—W m ,h,e(t cos (/>,t sin (j>)j dtdcj) 

= i C h e ~ 2/3t2 (g k , € ~ g m ,e) (t 2 - h~ 2 ) g e (t)dtd(j). 

For all zel 2 and B m o = lim e _ > o B m ,e defined in (69), we define the following quantity 

I{z) := yj \t\Che~ 2,3t2 [ls fei0 - ls m , 0 ] ( t 2 - h~ 2 ) [l( 0 ,oo)W - l(-oo,o)W] dtd,<f>. 

Lebesgue dominated convergence Theorem guarantees that 

lim (/ J e~ lt { z ’^\t\C h e~ 2 l 3 t 2 (g k , £ - g m ,e) (t 2 ~ hr 2 ) g € (t)dtd$J = I(z). 

Therefore, there exists an e > 0 (possibly depending on n, z) such that 

\W k ,h,e( z ) ~ W m,hA z )I > ^ 1-7(2:)| ■ 

Taking z = (0, 2 h), Fubini’s Theorem gives 
1{z) = 

X [l(0,oo)(^) 1(—oo,0)(^)] dtd(f) 

^ [l(0,Oo)(7) 1( — OO,0) (7)] dt. 


Note that 



= w(iHo(2ht) + Jo{2ht)), 


where Hq and Jo denote respectively the Struve and Bessel functions of order 0. By definition, 
Hq is an odd function while J 0 and t —► \t\Che~ 2dt ~ [i-A k 0 — l^ m 0 ] {t 2 — h ~ 2 ) are even functions. 
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Consequently, we get 

I{z) = ~^ iC h j \t\H 0 {2ht)e~ 2/3t2 [l Ak o - l Am 0 ] ( t 2 - h ~ 2 ) [l (0jOo) (2 ht) - l ( _ OO)0) (t)] dt 

1 f°° 

= ^ iC h j tH 0 (2ht)e- 2 ^ 2 [l AktO - l Am J ( t 2 - h ~ 2 ) dt 

= ^ " tH 0 {2ht)e~ m2 dt - j™ tH o {2ht)e~ 20t2 dt]^j , 

with a& and && defined in (44). For some numerical constant c > 0, 

](Z m , bm [c [h 1 ,h 1 + chS]. 

On [ft -1 , h~ 1 +ch6\ and for a large enough n, functions t —> Ho(2ht ) and t -A te _2/3t are decreasing 
and 

min \H 0 (2ht)} > 1/2. 

te[h~ 1 ,h~ 1 +ch5] 

Assume without loss of generality that k < m. We easily deduce from the previous observations 
that 


m\ > 


> 


\Ch[ I 

47T \ 

1^1 

16n/3 


r b k fbm N ' 

te-W dt- / te-W dt 


o-20 a 2 k 


o-20b 2 k + e -20b 2 m _ e — 2 j 8 . 


A) 


> hftLg-2 0h 2 e ~20amS_ e -20a{m-k)S\^ _ e ~20S\ 
~ 107T/3 

Therefore, some simple algebra gives 

\I{z)\ > c5 2 \Ch\n~w^ > ac'ti log _ 3 ^ 2 (n), 


for some numerical constants c, d > 0 depending only (3. Taking the numerical constant a > 
0 small enough independently of n,f3, 7 , we get that Condition (C2) is satisfied with <p n = 
cn~ 2 (.P+~r) log _ 3 ^ 2 (n). 

Concerning Condition (C3), we proceed similarly as above for the quadratic risk. The only 
modification appears in (65)-(66) where we now use (69)-(70) combined with the fact that 

|Supp(<?')| < 2 e and |Supp(^ J| < 2 Se 

by construction of these functions. Therefore, the details will be omitted here. 


Appendix C. Proof of Theorem 3 - Adaptation 
The following Lemma is needed to prove the Theorem 3. 

Lemma 5. For k > 0, a constant, let £ K be the event defined such that 

M 

(72) S K = n {|| Wj m - EfW^JIU < ne^r n (x + logM)} . 

m =1 

Therefore, on the event £ K 

II - W P |U < C mm + e* h ™ r n (x + logM)} , 

m l<m<M K ) 

where C > 0 is a constant depending only on 7 , /?, L , r, n and is the adaptive estimator with 
the bandwidth h{^ defined in (30). 

The proof of the previous Lemma is done in D.2. For any fixed in £ {1, • • • , M}, we have in view 
of Proposition 2 that 

P (|| Wl m - < Ce* h "r n (xj) > 1 - e"*, 
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where r n (x) = max ^By a simple union bound, we get 

P f H {fe - nwz JIU < c 2 e^~\ n {x)) ] > 1 


— Me~ x . 


^1 <m<M 

Replacing x by (a: + log M), implies 


n 


, l<m<M 


- EfW^JHoo < C 2 e^~\ n (x + logM)} j > 1 - e- 


For k > Ci , we immediately get that P(£ K ) > 1 — e~ x and the result in probability (31) follows 
by Lemma 5. To prove the result in expectation (32), we use the property E [Z] = f 0 P (Z > t)dt, 
where Z is any positive random variable. We have indeed for any 1 < m < M that 


W 7 . - Wplloo > C (h r J?- x e "i + e^r n (x + logM)) ) < e" x , Vx > 0. 


Note that 


r n (x + log M) = max ■ 


x + log(eM) x + log (eM) 


( /log eM log eM 1 f [x x 

< max < \ -,- > + max ( J-V- 

I V n n I (V n n 

< r n (logM) + r n (x — 1). 

Combining the two previous displays, we get \/x > 0 
P ( WZ f - Wp ||oo > c (h^e-^n + e'lb™ [r n (log M) + r „(x - 1)])) < e~ x . 

Set Y = \\W^ — WpWoo/C, a = C /2_1 e“4i + r n (\ogM) and b = e^ h ™ . We have 

/»00 nOO 

E[Y] = a + E[Y — a] = a + P (Y — a > u) du = a + b / P (Y — a > bt) dt. 

Jo Jo 


Set now t = r n (x — 1). If 0 < t < 1, then we have t = If t > 1 then we have t = Thus we 
get by the change of variable t = that 


l = ^ r(Y-„>b^j^=dx 


< _ / —=dx < —f=, 

2 V n Jo v® v n 

where c > 0 is a numerical constant. Similarly, we get by change of variable t = — 

f P (Y-a>bt)dt= f P (Y-a>b-)-dx<- [ 

J i J n v n n J n 


— 5 

n 


where c' > 0 is a numerical constant. Combining the last three displays, we obtain the result in 
expectation. 


Appendix D. Proof of Auxiliary Lemmas 
D.l. Proof of Lemma 2. To prove the uniform bound of (36), we define 

Sh = max (Ifle 7 * \ . 

|*| <h-i l J 
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Then, by definition of K'J and by using the inverse Fourier transform formula, we have 


II K, 


h ll°o 


1 Jf-l 

= WZ 5 h SU P 




e~ Ux Kl{t)dt 


* tX 


h- 1 


|f|e 7t dt 


-l 


1 


(73) 


2n 

< -s h 

7r 

C ^-87\e^ h ~ 2 - 1) < —i5r 1 (e 7/l_2 - 1 )<JL := u. 

- 2 7 7T h y ’ - 2 ' 1 'K h v ’ - 277r 


te 7t dt < - — S h 1 
- 277 T h 


2 r yte yt dt 


For the entropy bound (37), we need to prove that K h (-) admits finite quadratic variation, i.e. 

£ y 2 (R), where V 2 CR) is the set of functions with finite quadratic variation (see Theorem 
5 of Bourdaud, Lanza de Cristoforis and Sickel (2006)). To do this, it is enough to verify that 
K 0 £ F? 2 ^ 2 (]R) and the result is a consequence of the embedding C Vij(R). 

Let us define the Littlewood-Paley characterization of the seminorm || • ||jy 2 2i as follow 

I 6 Z 

where cq(-) is a dyadic partition of unity with cq symmetric w.r.t to 0, supported in 

[—2 i+1 , —2 /_1 ] U [2 i_1 ,2 i+1 ] 

and 0 < < 1 (see e.g. Theorem 6.3.1 and Lemma 6.1.7 in the paper of Bergh and Lofstrom 

(1976)). Then, K^ £ B^CR), if and only if ||At(IIi /2 2 1 is bounded by a fixed constant. By 
isometry of the Fourier transform combining with definition of cq and R’7, we get that 

= ||a J 7-i[i^]||2 = ||a J ^|| 2 


= 4 2 


'[0,h.- 1 ]n[2 i - 1 ,2 i + 1 ] 


ai{t) 2 \t\ 2 e 2 ^ t2 dt 


< 


£2g27 1 2 dt. 


y J[o,fc- 1 ]n[2‘“i,2<+ 1 ] 

A primitive of t —> f 2 e 27 * 2 is ^-te 2lt " — f* e 2ju2 du. Thus, we get that 


< A /J/i- 1/2 e 7 ' 1 " 2 , VI £ Z 


and ll A 7lll/2,2,l < \j^~ h 1,2eih 2 E 2V2 ’ 

l ——00 

where Lh = Llog 2 (fr _1 ) + lj. A simple computation gives that 

, /2 V2 2 ( i '*+ 1 )/ 2 -l y/2 2 , 1/2 

~ y /2 — 1 v ^-1 ~ y / 2-1 y / 2-1 

Combining the last two displays and since /i -1 > 1, we get 

irai/ 2 , 2,1 < cfy-W\ 

where c > 0 is a numerical constant. This shows that S^ 1 \\K^\\ 2 1 is bounded by a fixed 

constant depending only on 7. Therefore K ^ £ V 2 QR) and the entropy bound (37) is obtained by 
applying Lemma 1 of Gine and Nickl (2009). 
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D.2. 

r n (x) 

(74) 


Proof of Lemma 5. We recall that the bandwidth h ^ with m is defined in (30). 
= max ( ./i±» i+4 anc [ define 

V y n 5 n J 

m* := argmin 1 < m < M |/i^ 2_1 e _K 17 + e< h ™ r n (x + logM)| , 


Let 


and 


B(m ) 


max ] || 

j:j>m I 


W^||oo-2KeA r n (a: + log M)j . 


In one hand, we have 

11^4 - ||oolm>m* - (11^4 - Iloo - 2 ne^ r n {x + logM)) l a>ro . 

+2Ke 7ft ffl r n (:r + log A7)l a>m . 

< ( B{m*) + 2ne lh ^r n {x + logM)} l A > m ». 

In the other hand, similarly, we have 

11^4 - t?4, ||oolm<m* < (S(m) + 2 K e^r n (x + logM)) 

Combining the last two displays, and by definition of £ K (-) in (29), we get 
11^4 - ^4. Iloo < (B(m*) + 2 K e 7/l ffir„(x + logM)) l a>ro . 

+ (B(rh) + 2ne lh ™* r n {x + log M)} 1 a< m . 

< B(m*) + B(m) + 2nr„{x + logM)(e 7/l ® + e 7?l ™ ) 

(75) = £(m*) + £(m) < 2£(m*), 

where the last inequality follows from the definition of m in (30). By the definition of B(-), it 
comes 

C(m*) = B(m*) + 2ne lh ™*r n {x + logM) 

= max {\\W^ -W^. Hex, -2ne lh o r n (x + log M)\ 

< m ax i|| WJ , -E[W 7 J|U + ||E[t? 7 J - Wp|U + \\W P - E[W 7 ,]|U 

3-3>m I m m "* 3 

+ ||E[W 7 ] - W 7 Hoo] - 2ne^\ n {x + logM)} + 2 K e^ l r n (x + logM). 

On the event S K , it follows that 

C(m*) < max { \\W^ - EfW^JIU + ||E[W^ ] - Wp|U + \\W P - E[W 7 ]|U 

l m m m 3 

—Ke jh i r n (x + log Af)} + 2 ne lh ™*r n (x + logM). 

As h m * > hj for all j > m*, we have —e 7? b < —e 7ft, m*. Therefore, on the event we get 

(76) C(rn*) < ||E[W 7 J - Wp|U + max {||E[W 7 ] - W p lu) + 2 ne^r n (x + log M). 
From (75) and on the event S K , we have 

11^4 - ^plloo < 11^4 - ^4, Iloo + 11^4, - Wplloo < |t?4, - Wplloo + 2C{m*) 

< IIW4, -EIW4JIU + ||E[W4J - Wplloo + 2£(m*) 

< Ke^r n (x + logM) + \\E[W ^J - Wp||oo + 2£(m*). 

Combining the last inequality with (76) 

11^4 - Wplloo < 5 Ke^r n (x + logM) + 3|E[W4J - WplU + 2^ {||E[W 7 ] - Wpl^} . 
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From Proposition 1, the bias is bounded by t — > t r / 2 1 e an increasing function for sufficiently 
small t > 0, and as s h m * > hj for all j > m* , we can write 

II - Wplloo < C (ne^ 2 r n {x + logM) + . 


The result comes from (74), the definition of m*. 

D.3. Proof of Lemma 4. In view of Fatou’s Lemma, we have 

limmf\ z \_ ) . 00 z 2 pQ(z) > J liminf|,|_ >00 2; 2 po(2 - x)N 1 (x)dx 


> 


rV 27 

/ liminf|-|_ >oo 2 2 p 0 (z - x)N 1 (x)dx. 


Recall that 7 = < 1/4, then for \z\ > +1 and any x € v / 2r)> it comes by Lemma 

3 that p 0 (z — x) > c(z — x)~ 2 . Thus, 




liminf^1^000^(2:) > c 


'-VZy 


N 1 (x)dx = c 


-1 V2n 


e 2 dx > c' > 0, 


where c! > 0 is a numerical constant . Choose now a numerical constant c > 0 such that 
fl~Po{x)dx > 1/2, therefore, for any |z| < 1 + y/2j and some numerical constant c" > 0 we 
get 

Pq(z) > [ po(x)N' 1 (z — x)dx > min {A rl (y)} [ po(x)dx 

J— c \y\<M+l+yfSy J—c 


> - min (1V 7 (2/)} > c" > 0. 
2 \y\<M+l+^/ 2 Z, 


D.4. Lemma 6. 

Lemma 6. The density matrix defined in (50) satisfies the following conditions are satisfied 

(i) Self adjoint: p( m ' h ) = (p( m >0)*. 

(ii) Positive semi-definite: p( m ’ h ) > 0. 

(Hi) Trace one: Tr(p( m >C) = 1 

Proof: 

• Note first that V mt h is not a Wigner function, however it belongs to the linear spans of Wigner 
functions. Consequently, it admits the following representation 

OO 

l( 0 , 7r (</>) = ^ T pk’ h) ^A x )^k{ x ) e ~ l( '^ k)4> , 
j,k=0 

where 

( 77 ) = J J ^K[V mth ](x,<j))ip j (x)'ip k (x)e- l( ' J - k)4, dxd(j). 

For the sake of brevity, we set from now on r = T ( m ,h)_ Note that the matrix p( m ’ h l satisfies 
P'j'k’ 1 ' 1 = pfk + r j,fc- Exploiting the above representation of r, it is easy to see that Tj t k = Tfcj for 
any j, k > 0. On the other hand, p^ is a diagonal matrix with real-valued entries. This gives (i) 
immediately. 

• We consider now (iii). First, note that 72.[P m ,ft,](-, (j>) is an odd function for any fixed (j>. Indeed, 
its Fourier transform with respect of the frist variable 

•A [R-lVm.hl (-,</>)] (t) = Vm,h (t COS (j), t sin <f >), 
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is an odd function of t for any fixed </>. Thus, it is easy to see that Tj j = 0, for any j > 0. Since 
p is already known to be a density matrix, this implies that 

Tr(p( m ’^)) = Tr :(pW) + Tr(r) = 1. 

• Now prove (ii). From (13), we have 

\fk,j(t)\ = n 2 \t\l ]ik (\t\/2), j > k. 

Moreover by Lemma 1, we have 

J , , If 1 _ if 0 < X < \/j + k + 1, 

J ' ,fc 1 ~ 7r \ e-f^^vT+fc+T) 2 if x > \Jj + k + 1. 

Therefore, by the change of variable (t, 4>) into w = ( w\,w 2 ), (77) is such that 


1 


(78) 


\Tj,k\ < I V (t cos </>, t sin <j))\\f(t)\dt = tt JJ \V(w)\l Jtk j dw 

< f \V(w)\dw + f |V(u))|e _ ^"^ _J ) dw = I\+I 2 , 
J\\w\\<Vj J\\w\\>Vj 

where J = j + k + 1. The term Ii can be bounded as follow 

h = aCoh-'e^ 2 [ e- 2 ^\g m {\\w\\ 2 -h- 2 )g{w 2 )\dw 

J \\wUV7 

< acoh~ 1 e /3h 2 [ e^ 2l3M ~\g m (\\w\\ 2 - h~ 2 )\dw, 

J\\w\\ 2 <j 

where Co = \J~kL((3 + 7). 

If k + j + 1 < a^, then I\ = 0. If k + j + 1 > o? m , then 


h < aCnh-^ePb' 


s—2/3IMI 


^a m <\\ w \\<b m dw < aSCoh~ 1 S 2 e^ h ~ e ~ 2/3a - 
(79) < aSCoS 2 h -1 5 2 e~P a ™ < aC 1 5e ~ fiJ , 

where C\ > 0 is a constant depending only on L,f3, 7. Similarly for I 2 , we get 


I 2 = aC n h- l e ph ~ 


\\>V7 


e -2/3||w[ |2 |g m (||w;|| 2 _ h- 2 )g(w 2 )\e^~^ 2 dw 


< aCoh^e^ 2 [ e- 2 / 3 ||u ' l| 2 | 5 m (||«;|| 2 -/i- 2 )|e (l|,u|| -^ J) 2 du;. 

J\\w\\ 2 >J 

If k + j + 1 > 6^, then I 2 = 0, otherwise if k + j + 1 < 6^, we have 

I 2 < aC 0 h~ 1 e l3h 2 [[e~ 2 ^ wll 2 l am< n w n <b e^ w ^ VJ)2 dw 


< a5C' o h- l 5 2 e 0h V 2/3a - < aSC' 1 h- 1 S 2 e ~ l5a ™ 

(80) < aSC[h- 1 6 2 e- i:ib ^e l3S < aC 1 5e> 3S e - 0J . 

Combining (78), (79) and (80), we get for any j ^ k that 

\ T j,k\ < cade 135 e~P^ +k+1 \ 

for some numerical constant c > 0. Since p is an Hermitian matrix (in), it admits real eigenvalues. 
For any eigenvalue A of p, in view of Theorem 4 below, there exists an integer j > 1 such that 


(81) 


X- p 


(o) 


o 0S—20 


— r . 

—. 1 j. 


n \ < M < ca5 Y^v e ~ 

k—1: k^j 

Recall that p^ = p a ’ X for some 0 < a, A < 1 where p a,x is defined in (42). Lemme 2 in the paper 
of Butucea, Gu^a and Artiles (2007) guarantees that 

a “ -T(o; + 1 )j _(1+q) (1 + o(l)), 


Pi i 


(1 — A) c 
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as n —> oo. We note that p A > 0 decreases polynomially with j whereas r 3 decreases exponentially. 
Taking the numerical constant a > 0 small enough in (47) independently of j, we get pjj > ^ > 0. 
Thus p is positive semi-definite. 


Theorem 4 (Gershgorin Disk Theorem). Let A be an infinite square matrix and let p be any 
eigenvalue of A. Then, for some j > 1, we have 

I A 4 — Aid I — r j( A )> 

where rfiA) = X fe >i :k & lAi.fcl- 

Proof: Let p be an eigenvalue of A with associated unit eigenvector v = (v \, v ?,...). We have 

Xv k = [Av]k = ^ A klV[. 

i> l 


We set k = argmax fc>1 (|ufc|). Then 


(v~ A kk) v k= J2 A ki vi - 

l : l^k 

Consequently 

d - A Tk\ s E lA&lSs E dal : = nM)- 

l -.l^k k l-.l^k 
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